Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation

نویسندگان

  • Joern Wuebker
  • Matthias Huck
  • Stephan Peitz
  • Malte Nuhn
  • Markus Freitag
  • Jan-Thorsten Peter
  • Saab Mansour
  • Hermann Ney
چکیده

We present Jane 2, an open source toolkit supporting both the phrase-based and the hierarchical phrase-based paradigm for statistical machine translation. It is implemented in C++ and provides efficient decoding algorithms and data structures. This work focuses on the description of its phrase-based functionality. In addition to the standard pipeline, including phrase extraction and parameter optimization, Jane 2 contains several state-of-the-art extensions and tools. Forced alignment phrase training can considerably reduce rule table size while learning the translation scores in a more principled manner. Word class language models can be used to integrate longer context with a reduced vocabulary size. Rule table interpolation is applicable for different tasks, e.g. domain adaptation. The decoder distinguishes between lexical and coverage pruning and applies reordering constraints for efficiency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Phrase-Based Translation with Jane 2

In this paper, we give a survey of several recent extensions to hierarchical phrase-based machine translation that have been implemented in version 2 of Jane, RWTH’s open source statistical machine translation toolkit. We focus on the following techniques: Insertion and deletionmodels, lexical scoring variants, reordering extensionswith non-lexicalized reordering rules and with a discriminative...

متن کامل

Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models

We present Jane, RWTH’s hierarchical phrase-based translation system, which has been open sourced for the scientific community. This system has been in development at RWTH for the last two years and has been successfully applied in different machine translation evaluations. It includes extensions to the hierarchical approach developed by RWTH as well as other research institutions. In this pape...

متن کامل

An Open-Source Hierarchical Phrase-Based Translation System

We present an open source translation system that provides a clean-room implementation of the hierarchical phrase-based statistical translation model introduced in (Chiang, 2005) and refined in (Chiang, 2007). To our knowledge this is the first freely available hierarchical phrase-based translation system which implements cube pruning. We introduce extensions to (Chiang, 2007) to take advantage...

متن کامل

Investigations on hierarchical phrase-based machine translation

In this thesis we investigate the hierarchical phrase-based approach to machine translation, with special attention to the search problem. This approach is nowadays one of the most widely applied for statistical machine translation, and thus a detailed study helps in advancing the state-of-the-art in the field. Two are the most widely used algorithms for translating using the hierarchical phras...

متن کامل

A Cocktail of Deep Syntactic Features for Hierarchical Machine Translation

In this work we review and compare three additional syntactic enhancements for the hierarchical phrase-based translation model, which have been presented in the last few years. We compare their performance when applied separately and study whether the combination may yield additional improvements. Our findings show that the models are complementary, and their combination achieve an increase of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012